- Who are we?
- Who are you?
- What is expected?
- Why does this class exist?
- Collection
- Changing computing (Parallel / Cloud)
- Course outline
Postdoc in the X-Ray Microscopy Group at ETH Zurich and Swiss Light Source at Paul Scherrer Institute
PhD Student in the X-Ray Microscopy Group at ETH Zurich and Swiss Light Source at Paul Scherrer Institute



- To understand what, why and how from the moment an image is produced until it is finished (published, used in a report, …) - To learn how to go from one analysis on one image to 10, 100, or 1000 images (without working 10, 100, or 1000X harder)
Experimental Design finding the right technique, picking the right dyes and samples has stayed relatively consistent, better techniques lead to more demanding scientits.
Management storing, backing up, setting up databases, these processes have become easier and more automated as data magnitudes have increased
Measurements the actual acquisition speed of the data has increased wildly due to better detectors, parallel measurement, and new higher intensity sources
Post Processing this portion has is the most time-consuming and difficult and has seen minimal improvements over the last years
| Year | Measurements | Publications |
|---|---|---|
| 2000 | 146 | 67 |
| 2008 | 584 | 110 |
| 2014 | 1031 | 128 |
| 2020 | 1081 | 133 |
To put more real numbers on these scales rather than 'pseudo-publications', the time to measure a terabyte of data is shown in minutes.
| Year | Time to 1 TB in Minutes |
|---|---|
| 2000 | 4096 |
| 2008 | 1092 |
| 2014 | 32 |
| 2016 | 2 |
If you looked at one 1000 x 1000 sized image every second, it would take you
139 hours to browse through a terabyte of data.
| Year | Time to 1 TB | Man power to keep up | Salary Costs / Month |
|---|---|---|---|
| 2000 | 4096 min | 2 people | 25 kCHF |
| 2008 | 1092 min | 8 people | 95 kCHF |
| 2014 | 32 min | 260 people | 3255 kCHF |
| 2016 | 2 min | 3906 people | 48828 kCHF |
\[ \textrm{Transistors} \propto 2^{T/(\textrm{18 months})} \]
Based on trends from Wikipedia and Intel
_Based on data from https://gist.github.com/humberto-ortiz/de4b3a621602b78bf90d_
There are now many more transistors inside a single computer but the processing speed hasn't increased. How can this be?
http://www-inst.eecs.berkeley.edu/~cs61c/sp14/ “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer, December 2007
The figure shows the range of cloud costs (determined by peak usage) compared to a local workstation with utilization shown as the average number of hours the computer is used each week.
The figure shows the cost of a cloud based solution as a percentage of the cost of buying a single machine. The values below 1 show the percentage as a number. The panels distinguish the average time to replacement for the machines in months
Here the equal cost point is shown where the cloud and local workstations have the same cost. The x-axis is the percentage of resources used at peak-time and the y shows the expected usable lifetime of the computer. The color indicates the utilization percentage and the text on the squares shows this as the numbers of hours used in a week.
| Lecture | Description | Applications |
|---|---|---|
| 23th February - Introduction and Workflows | Basic overview of the course, introduction to the basics of images and their acquisition, the importance of reproducibility and why workflows make sense for image processing | Calculating the intensity for a folder full of images |
| 2rd March - Image Enhancement (A. Kaestner) | Overview of what techniques are available for assessing and improving the quality of images, specifically various filters, when to apply them, their side-effects, and how to apply them correctly | Removing detector noise from neutron images to distinguish different materials |
| 9th March - Tutorial on Python and Jupyter (TBA) | An introduction to the Python world of image analysis and the scikit projects | Getting familiar with Python and learning how the basic scikit tools work |
| Lecture | Description | Applications |
|---|---|---|
| 16th March - Basic Segmentation, Discrete Binary Structures | How to convert images into structures, starting with very basic techniques like threshold and exploring several automated techniques | Identify cells from noise, background, and dust |
| 23th March - Advanced Segmentation | More advanced techniques for extracting structures including basic clustering and classification approaches, and component labeling | Identifying fat and ice crystals in ice cream images |
| Lecture | Description | Applications |
|---|---|---|
| 30th March - Analyzing Single Objects | The analysis and characterization of single structures/objects after they have been segmented, including shape and orientation | Count cells and determine their average shape and volume |
| 6th April - Analyzing Complex Objects | What techniques are available to analyze more complicated objects with poorly defined 'shape' using Distance maps, Thickness maps, and Voronoi tesselation | Seperate clumps of cells, analyze vessel networks, trabecular bone, and other similar structures |
| 13th April - Many Objects and Distributions | Extracting meaningful information for a collection of objects like their spatial distribution, alignment, connectivity, and relative positioning | Quantify cells as being evenly spaced or tightly clustered or organized in sheets |
| Lecture | Description | Applications |
|---|---|---|
| 27th April - Statistics and Reproducibility | Making a statistical analysis from quantified image data, and establishing the precision of the metrics calculated, also more coverage of the steps to making an analysis reproducible | Determine if/how different a cancerous cell is from a healthly cell properly |
| 4th May - Dynamic Experiments | Performing tracking and registration in dynamic, changing systems covering object and image based methods | Turning a video of foam flow into metrics like speed, average deformation, and reorganization |
| 11th May - Scaling Up / Big Data | Performing large scale analyses on clusters and cloud-based machines and an introduction of how to work with 'big data' frameworks | Performing large scale analyses using ETHs clusters and Amazons Cloud Resources, how to do anything with a terabytes of data |
| Lecture | Description | Applications |
|---|---|---|
| 18th May - Guest Lecture - High Content Screening (M. Prummer) / Project Presentations | How Roche does Microscopy at Scale with High Content Screening and what the important image analysis aspects are | Robust analysis of millions of images for making decisions about pharmaceuticals to pursue |
| 1st June - Guest Lecture - Big Aerial Images with Deep Learning and More Advanced Approaches (J. Montoya) | Applying more advanced techniques from the field of Machine Learning to image processing segmentation and analysis of aerial images specifically Support vector machines (SVM) and Markov Random Fields (MRF) | Identifying houses, streets, and cars in satellite images |
| NA | NA | NA |
A very abstract definition: A pairing between spatial information (position) and some other kind of information (value).
In most cases this is a 2 dimensional position (x,y coordinates) and a numeric value (intensity)
| x | y | Intensity |
|---|---|---|
| 1 | 1 | 28 |
| 2 | 1 | 13 |
| 3 | 1 | 40 |
| 4 | 1 | 49 |
| 5 | 1 | 18 |
| 1 | 2 | 47 |
This can then be rearranged from a table form into an array form and displayed as we are used to seeing images
The next step is to apply a color map (also called lookup table, LUT) to the image so it is a bit more exciting
Which can be arbitrarily defined based on how we would like to visualize the information in the image
Formally a lookup table is a function which \[ f(\textrm{Intensity}) \rightarrow \textrm{Color} \]
These transformations can also be non-linear as is the case of the graph below where the mapping between the intensity and the color is a \(\log\) relationship meaning the the difference between the lower values is much clearer than the higher ones
On a real image the difference is even clearer
For a 3D image, the position or spatial component has a 3rd dimension (z if it is a spatial, or t if it is a movie)
| x | y | z | Intensity |
|---|---|---|---|
| 1 | 1 | 1 | 67 |
| 2 | 1 | 1 | 100 |
| 3 | 1 | 1 | 69 |
| 1 | 2 | 1 | 72 |
| 2 | 2 | 1 | 63 |
| 3 | 2 | 1 | 34 |
This can then be rearranged from a table form into an array form and displayed as a series of slices
In the images thus far, we have had one value per position, but there is no reason there cannot be multiple values. In fact this is what color images are (red, green, and blue) values and even 4 channels with transparency (alpha) as a different. For clarity we call the dimensionality of the image the number of dimensions in the spatial position, and the depth the number in the value.
| x | y | Intensity | Transparency |
|---|---|---|---|
| 1 | 1 | 51 | 49 |
| 2 | 1 | 52 | 40 |
| 3 | 1 | 44 | 7 |
| 4 | 1 | 40 | 35 |
| 5 | 1 | 25 | 43 |
| 1 | 2 | 19 | 57 |
This can then be rearranged from a table form into an array form and displayed as a series of slices
At each point in the image (black dot), instead of having just a single value, there is an entire spectrum. A selected group of these (red dots) are shown to illustrate the variations inside the sample. While certainly much more complicated, this still constitutes and image and requires the same sort of techniques to process correctly.
| Modality | Impulse | Characteristic | Response | Detection |
|---|---|---|---|---|
| Light Microscopy | White Light | Electronic interactions | Absorption | Film, Camera |
| Phase Contrast | Coherent light | Electron Density (Index of Refraction) | Phase Shift | Phase stepping, holography, Zernike |
| Confocal Microscopy | Laser Light | Electronic Transition in Fluorescence Molecule | Absorption and reemission | Pinhole in focal plane, scanning detection |
| X-Ray Radiography | X-Ray light | Photo effect and Compton scattering | Absorption and scattering | Scintillator, microscope, camera |
| Ultrasound | High frequency sound waves | Molecular mobility | Reflection and Scattering | Transducer |
| MRI | Radio-frequency EM | Unmatched Hydrogen spins | Absorption and reemission | RF coils to detect |
| Atomic Force Microscopy | Sharp Point | Surface Contact | Contact, Repulsion | Deflection of a tiny mirror |
here the measurement is supposed to be from a typical microscope which blurs, flips and otherwise distorts the image but the original representation is still visible
here the measurement is supposed to be from a diffraction style experiment where the data is measured in reciprocal space (fourier) and can be reconstructed to the original shape
Copyright 2003-2013 J. Konrad in EC520 lecture, reused with permission
\[ \left[\left([b(x,y)*s_{ab}(x,y)]\otimes h_{fs}(x,y)\right)*h_{op}(x,y)\right]*h_{det}(x,y)+d_{dark}(x,y) \]
\(s_{ab}\) is the only information you are really interested in, so it is important to remove or correct for the other components
For color (non-monochromatic) images the problem becomes even more complicated \[ \int_{0}^{\infty} {\left[\left([b(x,y,\lambda)*s_{ab}(x,y,\lambda)]\otimes h_{fs}(x,y,\lambda)\right)*h_{op}(x,y,\lambda)\right]*h_{det}(x,y,\lambda)}\mathrm{d}\lambda+d_{dark}(x,y) \]
Inspired by: imagej-pres
Which center square seems brighter?
| Are the intensities constant in the image? |
| ## Reproducibility |
| Science demands repeatability! and really wants reproducability - Experimental conditions can change rapidly and are difficult to make consistent - Animal and human studies are prohibitively time consuming and expensive to reproduce - Terabyte datasets cannot be easily passed around many different groups - Privacy concerns can also limit sharing and access to data |
Easy to follow the list, anyone with the right steps can execute and repeat (if not reproduce) the soup
Here it is harder to follow and you need to carefully keep track of what is being performed
Clearly a linear set of instructions is ill-suited for even a fairly easy soup, it is then even more difficult when there are dozens of steps and different pathsways
Furthermore a clean workflow allows you to better parallelize the task since it is clear which tasks can be performed independently